-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
[#38390] Bug fix for Datetime64Formatter with values of ndim > 1 #38391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#38390] Bug fix for Datetime64Formatter with values of ndim > 1 #38391
Conversation
@@ -3106,6 +3106,43 @@ def format_func(x): | |||
result = formatter.get_result() | |||
assert result == ["10:10", "12:12"] | |||
|
|||
def test_datetime64formatter_2d_array(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need any tz-aware cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good idea. It's likely to fail the same way so I'll add that too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pandas/io/formats/format.py
Outdated
if not isinstance(values, DatetimeIndex): | ||
values = DatetimeIndex(values) | ||
if not isinstance(flat_values, DatetimeIndex): | ||
flat_values = DatetimeIndex(flat_values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DatetimeIndex
appears to only support 1-d values. It will actually construct with dim > 1 but can error at some places, so pass flattened values just to be safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could use DatetimeArray?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that is a little cleaner since DatetimeArray._format_native_types
is the only thing being used here. It still might be good to call values.ravel()
here since it's needed if self.formatter
is set, although not really necessary otherwise.
pandas/io/formats/format.py
Outdated
flat_str_values = np.array([self.formatter(x) for x in flat_values]) | ||
fmt_values = flat_str_values.reshape(values.shape) | ||
else: | ||
fmt_values = flat_values._data._format_native_types( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will actually call ravel()
/reshape()
itself, but flattened values are needed above so easier to use that here as well.
Thanks for the review @jbrockmendel , sorry for the delay. I updated with a fix and tests for Datetime64TZFormatter as well, since the same bug happens. |
75c381c
to
da0f037
Compare
The remaining test failures look like they are unrelated from a network timeout |
values = DatetimeIndex(values) | ||
values = np.asarray(self.values) | ||
flat_values = values.ravel() if values.ndim > 1 else values | ||
flat_values = DatetimeArray(flat_values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DatetimeArray can handle 2D values, so some of the ravel/ndim checks might not be needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I noticed it's able to handle 2D values, but the extension array that I was fixing this for can have higher dimensions, so it would still need to check above that plus when self.formatter
is set. It seems cleaner to flatten at the beginning and work with flattened values. And I think calling ravel()
a second time would be a no-op?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK no problem
@@ -242,6 +242,7 @@ I/O | |||
- Bug in :func:`read_csv` not accepting ``usecols`` with different length than ``names`` for ``engine="python"`` (:issue:`16469`) | |||
- Bug in :func:`read_csv` raising ``TypeError`` when ``names`` and ``parse_dates`` is specified for ``engine="c"`` (:issue:`33699`) | |||
- Allow custom error values for parse_dates argument of :func:`read_sql`, :func:`read_sql_query` and :func:`read_sql_table` (:issue:`35185`) | |||
- Bug in :class:`Datetime64Formatter` that caused error on string representation with extension types of datetime64 values and ndim > 1 (:issue:`38390`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not user facing (as its internal), is there anything that is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bug was found when calling repr()
on an ExtensionType
, I will put this in terms of that instead.
if not isinstance(values, DatetimeIndex): | ||
values = DatetimeIndex(values) | ||
values = np.asarray(self.values) | ||
flat_values = values.ravel() if values.ndim > 1 else values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use the @ravel_compat
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that would work. The only thing is that if self.formatter
is set, then it would add an unnecessary conversion to an numpy.ndarray
and then back again to list
. I'm not sure how crucial that case is, would you like me to go ahead and change it anyway?
|
||
if self.formatter is not None and callable(self.formatter): | ||
return [self.formatter(x) for x in values] | ||
fmt_values = [self.formatter(x) for x in flat_values] | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we not using just the _format_native_types here? this makes this way more complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, which line are you referring to? Did you mean instead of calling self.formatter
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the issue is that self.formatter may be user-supplied at some point in the process?
more importantly, shouldn't this actually be handled on the EA itself? |
Thanks for reviewing @jreback , I just had a couple questions.
This would be preferred, but I could not find a way to do this. When calling
Is there a way for the ExtensionArray to tie into this somehow? |
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
I am still planning on working this. I noticed there is a similar problem with extension arrays with floating point values of ndim > 1, so I will work on a fix that will hopefully handle both of these cases soon. |
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
I haven't had the time to get back to this, will close for now and reopen when I can update |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
This change fixes a bug in
Datetime64Formatter
that errors with values of ndim > 1. This was discovered when using an ExtensionArray with 2D values of datetime64 and attempting to format/display. The formatter currently allows for 2D values, but will end up returning nested strings instead of a flat list of string. The same error occurs when using theDatetime64Formatter
directly on 2D values.The fix is to flatten the values to get a flat list of formatted datetime64 strings, then reshape, and then format a second time over the outer dimension to produce the properly nested list format and the final flat list of strings.